Guard *scanf() return type extension by counter#5641
Conversation
|
can you come up with a test, this change fixes? |
|
Absolutely — I already have a fixture for the exact example from the commit message (mixing positional |
|
Wow |
3b1a8e8 to
c386aab
Compare
|
Tests are green – the phpbench and Infection reds look pre‑existing / unrelated to this change. Let me know if you’d like me to dig into those or if this is good to go. 😊 |
e4293ad to
e38254a
Compare
|
Created a try snippet so it's better to compare with before and now aligned with the fixture that had another blooper: https://phpstan.org/r/087c4c5f-9a00-47cc-8d36-f21fb1b27a31 |
| } | ||
|
|
||
| function sscanfInvalidFormatMixingPositionalWithSequential(string $s) { | ||
| assertType('null', sscanf($s, '%1$s %s')); |
There was a problem hiding this comment.
Shouldn't this be NEVER or Error on PHP8+ since it will throw an error ?
There was a problem hiding this comment.
But given the fact the return type extension already use TypeCombinator::addNull somewhere else, I'm unsure...
There was a problem hiding this comment.
Good point! Yes, on PHP 8+ it should be never (the call throws, 3v4l.org). I kept null for now as a safe lower bound that doesn’t require a PhpVersion dependency in the extension. If you prefer the stricter version, I can inject PhpVersion and return NeverType for ≥8.0.
/E: found the pattern
There was a problem hiding this comment.
But given the fact the return type extension already use
TypeCombinator::addNullsomewhere else, I'm unsure...
... me either but I wanted to give it a try and put a fixup on top with php version id test data files (.php scripts) that mimic the same branching as for explode(). The fixup keeps all options open if you prefer something else – let me know.
|
Initially I thought using the new count helper, we could get rid of a regex path. Since this is not the case, I think the PR as is does not provide much value. We would already report a error for the invalid format. |
|
Thanks for the honest feedback! I’d like to clarify what this small PR actually fixes, because I think there might be a subtle but important distinction. The existing rule already reports an error on an invalid format, yes. But the return type extension runs independently—and previously it would still infer This PR adds a minimal gatekeeper that short‑circuits the type inference for any format the counter finds invalid. It returns So the value isn’t “a more precise return type” in isolation—it’s eliminating a class of false‑positive type information that otherwise pollutes later analysis steps. And it does so with a tiny, proven, self‑contained check that doesn’t touch the regex at all. I’m happy to iterate or, if you’d rather, close this PR and save the gatekeeper for a later round. Just wanted to make sure the motivation is clear. 😊 |
please show a real-world like snippet on phpstan.org/try which proofs this statement |
|
Fair enough – here’s a real‑world‑like snippet: function test(string $s): void {
$result = sscanf($s, '%1$s %s');
if ($result !== null) {
echo count($result);
}
}Right now, PHPStan infers (Note: Let me know if you’d like the snippet turned into a test fixture – happy to add it. |
144d4e0 to
75e26f5
Compare
your example does not contain a false positive as in
|
|
Fair point — I should have said "false-positive type information" rather than "errors", and I stand corrected. The old return type implies the call might succeed; the gatekeeper corrects that. This PR is ready either way. If you'd like to merge it as the first step before fixing the regex, great; if not, I'll close it and revisit later. Just let me know. 😊 |
0161e88 to
89bba3d
Compare
Use the recently fixed `PrintfHelper::getScanfPlaceholdersCount()` to detect invalid format strings in the return type extension. If the format is uncountable, the call is guaranteed to fail – return `NullType` immediately. This is the same approach that eliminated the count regression. It already fixes all false‑positive type inferences for malformed formats. For example, invalid format (mixing positional %n$ with sequential %) returns `null`. Gegenprobe: the counter doesn’t guess – it asks PHP itself.
Return `NeverType` on PHP 8+ (where an invalid format throws
`ValueError`) and `NullType` on PHP < 8.0. This makes the gatekeeper
precise about the call never returning on modern PHP.
The test strategy for the version‑dependent types will be settled
in review – the CI now whispers `*NEVER*` where `null` stood before,
and `array{}|null` where `array|null` stood before.
Regression cases for the function's dynamic return type extension. Taken from the earlier PR. Paint the build red.
In PHP’s sscanf/fscanf, a NUL byte (\0) in the format string terminates parsing at the C level. The return type extension was not truncating the format, so placeholders after a NUL could leak into the inferred type. Truncate the format string at the first NUL byte before matching conversion specifiers in SscanfFunctionDynamicReturnTypeExtension. The counter already handles NUL correctly because the runtime sscanf call stops at the NUL.
Refine return type, refine the test data. Paint the build red again.
When the format string is empty (e.g. after NUL truncation, as can be
seen on 3v4l.org [1]), sscanf returns an empty array, not null, on all
PHP versions. Return the precise `array{}` type instead of falling
through to the default `array{}|null` signature.
[1]: https://3v4l.org/Xpbu5
No need to run the preg_match_all machinery when we already know there are no placeholders – the counter has already told us.
Update the return type extension to correctly handle the remaining specifier patterns found in phpstan's source and testdata corpus, as well as those reported in issues. These are in part LLM inferred from the earlier PR, let's see how far we can recycle them. Will paint the build red for sure, likely more than once, let's see.
Update the return type extension to correctly handle the remaining specifier patterns found in phpstan's source and testdata corpus, as well as those reported in issues. These are in part LLM inferred from the earlier PR, let's see how far we can recycle them. Will paint the build red for sure, likely more than once, let's see. This looks better to me. Red still is the new green.
Update the return type extension to correctly handle the remaining specifier patterns found in phpstan's source and testdata corpus, as well as those reported in issues. - Expand the regex to cover all valid specifiers (`%i`, `%D`, `%g`, ...) and handle length modifiers (`h`, `l`, `L`). - Fix scanset parsing with `%[...]` to allow `]` as the first character inside the set, matching the runtime behavior. - Eliminate false matches of `%%` using the "Black Magic" recipe by NikiC: `(?:%%)+(*SKIP)(*FAIL)|%`. - Tighten string types to `non-empty-string` for `%s`, `%c`, `%[...]`, which always return at least one character on a successful match. - Map `%u` to `int|non-falsy-string` to account for its overflow behavior. These changes build on the previously merged counter fix and the empty‑format / NUL‑byte guard.
|
I’ve been thinking about your hope to use the counter to circumvent the regex path. This PR now takes a first concrete step in that direction: the NUL‑byte fix and the builder lift make the counter the foundation for the array shape, and the regex overhaul (applied here) fixes all the known missing‑specifier cases. I’ve also recycled the test cases from the earlier bot‑generated PR to prove the fixes. I’d love to hear if this incremental approach aligns with what you had in mind – the regex is still here for precision, but the counter now guarantees correctness. Happy to adjust as you prefer. |
Use the recently fixed
PrintfHelper::getScanfPlaceholdersCount()to detect invalid format strings in the return type extension. If the format is uncountable, the call is guaranteed to fail – returnNullTypeimmediately.This is the same approach that eliminated the count regression. It already fixes all false‑positive type inferences for malformed formats.
For example, invalid format (mixing positional %n$ with sequential %) returns
null.